This project is meant as a help for anyone who wants to explore U.S. wages, learn how to bridge data analysis to visualization, study distribution modeling, or build a complete web app inside a single HTML page using only vanilla JavaScript.
I use this document not just as a guide to the interface and settings, and not just as an explanation of the occupation tree structure. It is also a filterable walk through of how the project was made. Depending on your preferences, it can reveal or hide details of the methods used to build CDF and PDF curves from quantiles, drawing ideas from statistics, math (calculus and geometry), data visualization practice and guidelines, and the supporting technologies Python, HTML, CSS, and JavaScript.
The core visual element of the project is the violin, a chart type usually used to encode an estimated distribution from raw data. In this project, however, the violins do the opposite. They act as a data deaggregator, reconstructing an approximate distribution shape from a handful of values. That reconstruction has two working routes in the project: a direct method that builds a smooth, constrained national shape from the national quantiles, and a mixture method that builds the national shape by combining many state-level reconstructions with employment weights. Both aim for the same outcome: stable, credible shapes that can be aggregated without losing mass. The idea turned out to be much harder than it looked at first, which is exactly why it became a challenge I enjoyed. What follows is a walk through my own path of thinking along the way.
The project is an interactive page for exploring U.S. wages using “violin” shapes. Each violin encodes the wage distribution of one occupation or group along the horizontal axis. Each violin is scaled by employment, so its area reflects how many workers are in that occupation. Because the wage axis stays fixed, more workers show up as a thicker violin, not a longer one. The violin is thick where many people earn around that wage, and thin where only a few do. The violins are placed vertically using an optimization method that reduces overlap, keeps the layout balanced, and makes labels easier to read.
The main view supports two scales: a linear scale and an adaptive scale. The adaptive scale gives more screen space to wage ranges where the data is crowded, and less space where it is sparse. The adaptive scale, unlike the linear scale, is signaled by variable tick spacing, background shading, and axis label size. A double click or double tap on the axis area switches the scale.
Hovering over (or tapping) an occupation’s violin highlights it and opens a tooltip. The tooltip shows a scaled-to-fit distribution shape, as a violin, PDF, or CDF, together with the key quantiles ✧q10, ✧q25, ✦q50, ✧q75, and ✧q90. To suggest increased uncertainty in the tails, the shape fades out gradually beyond the outer quantiles.
A click or tap pins the tooltip. This switches it from a quick tooltip into a small investigation tool. Once pinned, its distribution view (violin, PDF, CDF) can be cycled with a long press on mobile or a right click on desktop. The CDF view is especially useful for judging, by eye, how well the reconstruction behaves between the quantile anchors.
In the pinned state, a draggable band estimates how many workers fall within a chosen wage interval. Clicking the info label cycles the bandwidth from $500 to $50,000, stepping through 2–5 multiples across powers of ten. Depending on context, clicking the major category header either focuses that major category or returns to the main view. Double clicking the graph toggles the band between free movement (100 dollar resolution) and snapped movement, where it jumps in steps equal to the selected bandwidth.
The main view works like a map: it can stay zoomed out for the big picture, or zoom in to explore one major category in detail. In the global view, even a thin slice of vertical space can contain dozens of detailed occupations, so major categories are given distinct colors and are visually separated from one another.
Zooming into a major category is triggered by a double tap or double click on any violin within that category. In focus mode, a double click anywhere on the violin area (except the wage axis) restores the global view. For convenience in global mode, violins are clipped to the major envelope, but hover or tap interaction can still identify violins that are only partly visible.
A context menu provides quick access to common actions, including focusing (zooming) a major category, toggling between linear and adaptive scales, searching for occupations, showing or hiding labels, and adjusting color settings such as contrast, luminosity, intensity, and label colors.
This project aims for a smooth experience without turning the screen into a cockpit. Settings are kept to a minimum on purpose, and anything I consider nonessential starts hidden. Most global settings live in the context menu (long press on mobile, right click on desktop). Others are attached to the tooltip itself, using simple gestures like single tap, double tap, or long press on the tooltip elements.
By default, the wage axis uses a linear scale. Unlike nonlinear options such as logarithmic, square root, or inverse hyperbolic sine, a linear scale preserves proportional distances and makes reciprocal comparisons natural. That is why it is usually the preferred choice in data visualization.
When the data becomes too dense to read comfortably, nonlinear scales can create breathing room, but they do it by bending positions and changing how sizes feel. In this project the wage range is truncated at $240,000, so a linear scale is often enough. Still, for categories where most values are crowded toward the lower end, I also provide an adaptive scale.
The adaptive scale is built from the global distribution, but its deformation is tamed so that adjacent tick gaps follow a monotone compression ramp: they start near a baseline spacing, shrink smoothly through the dense region, and then clamp to a constant spacing in the tail. Like any nonlinear scale, it introduces distortion, so the design makes its presence hard to miss: tick spacing changes, background shading shifts, and labels adjust to signal that the axis is warped. The transition is animated in JavaScript to show what is moving where when you switch between linear and adaptive (and back). You can set the scale mode from the context menu, and it defaults to linear. You can also toggle it by double tapping the wage axis area.
In textbook probability, the area under a density curve is 1. In this project, the violins are scaled by worker counts, so the area of a violin grows with employment. The side effect is that some categories, even when they contain lots of occupations, can become almost line thin at the global scale.
To deal with that, you can zoom in on any major category or detailed occupation by double tapping its violin or its label. To return to the global view, double click the violin drawing area (not the axis area, where double click switches between linear and adaptive scales). The context menu also detects the category under the pointer and names the zoom action explicitly, for example, “Zoom Sales …”.
I put significant effort into making the wage map readable. The violin placement algorithm includes an optimization component that tries to expose as many labels as possible without turning the view into clutter.
In focus view, you can choose what percentage of labels to show, which is mainly a way to fine tune how many of the large occupations (those with high worker counts) stay visible. A “hide overlapping labels” option, enabled by default, helps keep the view readable. If you prefer labels that match category colors, there is also a “colored labels” option. It is hidden by default and starts disabled.
Exploration would not feel complete without a natural search feature. Search is always visible in the context menu. As you type the first characters of an occupation name, the interface adaptively hides labels that do not match and highlights the matching substring in those that do.
This behavior applies to detailed occupations. For major categories, a star indicates that the category contains at least one occupation whose name matches the pattern. Matching is case insensitive and Unicode aware.
Pressing “Go” (Enter) navigates to the first match (ordered by number of workers), highlights it, and opens its tooltip. If the match is outside the current focus, the view automatically focuses the appropriate major category first. Clearing the search pattern restores the label visibility state that was active before the search.
As a nonessential feature, global luminosity, contrast, and color intensity can be adjusted. These options are hidden by default in the context menu.
The tooltip is the central visual element of the interface, providing details for a highlighted occupation or major category. A click or tap pins it, and in this pinned state several parts become interactive.
The titles can be used to expand or collapse a category, indicated by the “≺” (expand) and “≻” (collapse) symbols. The draggable band estimates how many workers fall within a wage interval, and shows the amounts inside and outside that interval. This estimate is derived from the reconstructed distribution, not from raw microdata.
Single-tapping the displayed values cycles through a range of bandwidths. Double-tapping the chart toggles band snapping between free movement and steps equal to the selected bandwidth. A long press (right-click on desktop) cycles through three distribution views: the violin (not weighted by worker counts), the PDF (the positive half of the violin), and the CDF, which shows cumulative density and helps judge the quality of the reconstruction. Because long press is overridden on many devices (for example, TV browsers), discreet buttons are provided as an alternative to gestures.
For this project I used the most recent national OEWS release (full data, 2024) published on the Bureau of Labor Statistics page, from which I downloaded the XLSX file.
For this project I used the following fields.
OCC_TITLE Occupation nameOCC_CODE Occupation code, used for building data hierarchyO_GROUP 'total', 'major' and 'detailed' items are used for filtering dataAREA_TYPE used for building PDFs. 2 for aggregating states to national level, and 1 for national (consolidated) dataA_PCT10 A_PCT25 A_MEDIAN A_PCT75 A_PCT90 yearly wage quantilesTOT_EMP Registered workers for every occupationEach OCC_CODE follows a “##-####” pattern. The overall total uses the code “00-0000”, major categories use “##-0000”, and detailed occupations use the full “##-####” form. A detailed occupation and its major category share the same first two digits. Detailed-occupation quantiles are used to reconstruct each violin shape. For a major category, the envelope is built by summing the height contributions of all its detailed occupations along the wage axis, and the overall envelope is built by doing the same over all detailed occupations.
Data-wise speaking, the occupations form a tree: total, major categories, and detailed occupations underneath majors. The hierarchy drives grouping, colors, and focus behavior.
Majors act like chapters that can be expanded/collapsed or focused. Details are leaf nodes shown in dense views or on demand.
At first, the published quantiles looked like endpoints in themselves: five reference wages per occupation, useful for quick comparisons. But the more the questions shifted toward spread, risk, and tail behavior, the clearer it became that five points are not a distribution. That is the moment the idea started to take shape: the quantiles had to be expanded into an estimated probability distribution over the wage axis. From there, the initial plan was straightforward: reconstruct a smooth, monotone CDF from a small set of interior anchors, then obtain the PDF by differentiation in a way that stays visually stable.
Implementation was anything but simple. I tested several parametric models often used for wage data, fitting the quantiles to families such as LN-Pareto, GB2, LN-GPD (C1 anchored), dPLN (Double Pareto Lognormal), Singh-Maddala (Burr XII), Dagum (Type I), and the Generalized Gamma (Stacy). I also tried more flexible, self-adjusting constructions that enforce monotonicity while still matching the five quantiles, including Hermite based interpolators, rational Ball curves, P-splines, and NURBS, with and without PDF tail tapering toward zero. Many of these worked well for large parts of the dataset, but none was dependable across every occupation, and that need for consistency is what kept the search going.
The main path changed after a key observation: the same five quantiles and employment totals exist for every state. This makes it possible to build a detailed national distribution by aggregation of estimators (model averaging): state level PDFs are reconstructed from state quantiles, then combined into a national PDF by employment weighted summation on a fine wage grid. This follows the practical intuition of the Law of Large Numbers: combining many independent state contributions tends to stabilize the overall shape even when each state reconstruction is only approximate.
Once implemented, this produced the detailed national shapes for almost the entire workforce (97%). Thus, the heavy optimization route, once the initial focus, became plan B rather than the core method.
In the dataset, each occupation comes with only five quantiles that can be used to reconstruct a violin: A_PCT10, A_PCT25, A_MEDIAN, A_PCT75, A_PCT90 at probabilities p = [0.10, 0.25, 0.50, 0.75, 0.90]. But to draw a full violin shape, and especially to build the outer envelope contours for major categories, I also need endpoints. That means inferring q0 and q100 for p = [0.0, 1.0].
I tried several extrapolation ideas, from heuristic rules to statistical and numerical approaches: linear and log-linear slope trends, geometric progression style rules, and Tukey fence bounds. They tended to fail for the same reason. With only five points, the outer gaps matter a lot, and many methods either become too sensitive to those gaps or depend on choices that are hard to justify.
The approach I settled on extends the quantile curve beyond the [q10, q90] range while keeping the next goal in view: the CDF must stay smooth and strictly increasing. The steps are: compute monotonicity-preserving Fritsch-Carlson slopes from the five known quantiles; extrapolate additional slopes toward p = 0 and p = 1 (with a log fallback to keep them positive); convert those endpoint slopes into provisional q0 and q100; then recompute slopes using all seven points and iterate a few times until the endpoints and slopes agree with each other. These endpoint estimates are then treated as provisional anchors for the next modeling step.
Honestly, I expected this part to be straightforward. I thought the hard work was in the previous step, estimating q0 and q100. Once I had endpoints, it felt like the rest should follow naturally from monotone cubic Hermite interpolation with slope limiting (for example, in the Fritsch-Carlson style). It keeps the curve monotone, stays smooth, and avoids overshoot.
What I did not expect was what happens after differentiation. The CDF looked fine, but its derivative, the PDF, was not C1 smooth, and it often behaved strangely near q0 and q100. The tails in particular ended in ways that did not look right, which made me doubt the endpoint estimates themselves.
From there I started crossing methods off the list, one by one: univariate splines, generic splines, penalized splines, integrated splines, parametric splines, and NURBS (with and without tolerances). I tried different optimizers and different goals, like smoothness, goodness of fit at the five quantiles, unimodality constraints, and forcing the PDF to taper to zero at both ends.
At one point I nearly settled on a custom monotone transformation that takes an initially smooth raw PDF and reshapes it into a unimodal form with tails tapered to zero. I also looked closely at how that kind of transformation flows through derivatives into the final shape. That work turned into a useful byproduct that I may publish separately, because it touches a broader question: monotone transformations that preserve a geometric notion of variation in the data. Still, even when the result looked good, the fit became too permissive, so I went back and tried classic wage families instead. That is when I tested the parametric models and the results were even less satisfying than the smoothing approaches. At that point it was clear this was the second challenge, and it was going to be more complex than the first.
After the earlier attempts failed, I moved to a more involved route: optimization. The five reported quantiles are treated as noisy observations. Instead of forcing the curve to pass through them exactly, I keep them inside a tolerance band, which is closer to how the data should be read in the first place.
The optimizer searches within a low-dimensional family of monotone spline or shape bases for x(p). For each candidate, it evaluates the curve on a dense p grid, then derives the implied PDF. There are safeguards to avoid near-vertical stretches in x(p), because those would turn into sharp spikes in the PDF.
The objective is a weighted combination of three things: (1) quantile residuals, handled as banded penalties, (2) smoothness of x(p), measured through curvature or second difference energy, and (3) anti-needle terms that discourage very small dx/dp or extreme peak-to-tail ratios. In some cases it also allows a small drift in the effective p locations of the outer anchors, so the fit does not have to force an implausible amount of tail mass.
Tail tapering happens after the core fit. Each tail is handled with a conservative post-process: first a smooth Hermite-like tip is appended so the PDF decays to zero beyond the current endpoint. Then a monotone nonlinear horizontal stretch is applied, strong near the far tip and minimal near the anchor, to preserve the integrated tail mass. This keeps the interior CDF essentially unchanged, but turns blunt cutoffs into a controlled fade to zero. The result avoids wall-like endings and replaces them with a taper that looks credible and also signals uncertainty.
After the reconstruction methods were settled, the pipeline became deterministic. For each detailed occupation, state rows are processed first. A state-level PDF is reconstructed from the five published quantiles and state employment, evaluated on a common wage grid, and then all state PDFs are combined into a national detailed PDF by employment-weighted summation along that grid.
A final KDE smoothing pass is applied after aggregation. State PDFs are computed on a fixed grid and are produced via monotone interpolation plus numerical differentiation, so each component can carry small grid-scale kinks. When many such approximate components are stacked, those kinks can accumulate into visible wobble even when the underlying shape is plausible. The KDE step smooths the aggregated curve only at the chosen bandwidth scale, removing discretization and reconstruction noise while preserving total mass (the PDF is renormalized after smoothing). It is not used to invent structure, but to produce a readable national density from the summed state contributions.
Coverage is treated explicitly. If the available state rows account for at least 95% of national employment for that occupation, the state-based mixture is kept as it is. For the intermediate range above 90%, the mixture is gently warped so its implied quantiles match the published national anchors, while preserving the overall shape. If state coverage falls below 90%, the state mixture is discarded and the detailed distribution is built directly from the five quantiles using the optimized smooth model described earlier.
Eventually the major categories and the overall distribution are not reconstructed from their own quantiles. Instead, they are computed as employment-weighted sums of the underlying detailed PDFs along the same wage grid. This preserves mass by construction and guarantees that every aggregate shape is consistent with the detailed components that define it. Therefore each detailed PDF is scaled by the number of registered workers before summation, and each resulting violin is drawn at a scale proportional to its workforce.
Once my geometry was ready, I had to decide how to place the violins. At the time, this felt like the “only” remaining challenge. That was a few months ago, when I first saw a visualization that encoded the same data with bubbles: occupations were positioned by their median wage, and bubble size represented the number of workers. It used a D3 beeswarm layout, basically a d3-force simulation that prevents overlaps and keeps the swarm compact.
I am a fan of beeswarms, and I have authored statistically driven density dot plots and density bars, but I am not a fan of D3 beeswarms. Not because they are poorly designed or unattractive. They are a beautiful piece of code by Mike Bostock. The issue is that they do not guarantee, or even try, to reconstruct the underlying distribution. They are physics driven arrangements, not statistically driven ones.
In my view, collapsing an entire occupation into a single median dot is the kind of shortcut the Grammar of Graphics makes easy to justify. It may not look like a major leap from a numerical summary, but it invites the wrong reading on both axes. On the wage axis, the real distribution spans far beyond the bubble. On the density axis, the swarm suggests structure it never actually reconstructs. This reminded me of something GoG promoters often overlook: the Grammar of Graphics is, at its core, a synthetic programming framework built for flexibility, not a blank check for arbitrary graphical combinations of numeric variables. Even its author acknowledges this in the book:
"This system is capable of producing some hideous graphics. There is nothing in its design to prevent its misuse."
Leland Wilkinson, The Grammar of Graphics (p. 15)
Let’s be clear: once I committed to using the actual violin geometry, there was no realistic way to get a perfectly non overlapping arrangement, unless the violins happened to fit together like a jigsaw I did not know existed.
So I accepted the constraint and designed a placement algorithm with a clear set of priorities, in this order: minimize overlap, cover the major envelopes well, keep the large shapes visible, and make labels readable. The key point is that this is not a one off trick that works only because this dataset is “nice”. It is a general optimization idea, and it adapted well across all 22 major groups and the overall “All Occupations” view.
After countless iterations, tuning, and mixing ideas, the final layout ended up better than I expected. To control visibility as much as possible, I also leaned on drawing techniques that matter when the view gets dense: grouping, size dependent transparency, careful z ordering, clipping paths, and subtle edging that varies with size.
Particular attention was given to the 22-category palette. With so many categories, achieving strong perceptual separation across the set is inherently difficult. The palette is therefore engineered for local contrast, maximizing perceptual distance between neighboring major categories so adjacent regions remain clearly distinguishable.
The result is the violin layout you are exploring right now.
The shapes I built are called violins, a term associated with probability distributions, but here they should be read as plausibility shapes. When the input is quantiles, the curve is not measuring frequency directly. It is a reconstruction that has to earn credibility by obeying constraints, staying stable across cases, and avoiding confident detail where the data provides none.
For many people I pass as a programmer, but my job was mostly a means to reach my real goal: numerical expressivity. From an early age, math was a guilty pleasure. Guilty not because it was wrong, but because it was easy for me and endlessly fascinating, and I could not really talk about it with my football teammates. So it stayed private. It was mine, mine alone, beyond strict formulas or theorems.
That habit never really went away. It just grew up and found a place where it could be useful. When you care about numerical shape, you stop treating “works on my example” as proof. You start treating it as an invitation to test, to probe, and to see how far the idea can be trusted. That is the mindset I brought into this project.
This project was slow work, and it truly looked like overthinking while it happened. Part of that is my own cognitive profile. “Nearly correct” does not calm me down. It usually triggers more determination than relief, because for me “good enough” often reads as “accidentally good,” not as “close to the actual solution.” The cases that worry me most are the ones that succeed often enough to look general, while still hiding a flaw in a narrow family of data. So I keep pushing until the method holds for those cases too, not just for the easy majority.
For me, when a solution holds entirely, and still holds under extreme test cases that were not even part of the dataset, I stop exploring. I delete variants, keep one path, and keep the dirt in the workshop. The app might look simple because the viewer sees the product in the window, not the cleaning process behind it.
Many wage charts treat pay like a scoreboard: one number, maybe two, and we call it a day. This project is my personal pushback. Wages are not a single fact. They are a spread, mainly a spread. For anyone who cares about jobs, budgets, hiring, negotiating, or simply why two people with the same title can live in different worlds, seeing the spread is the point.
This is not about being clever with graphics. It is about being honest to the numerical shape. For me, the goal is a view where the data stays visible, interaction helps the reader, and every choice in math, layout, styling, and UI earns its place by making interpretation clearer without changing meaning.
I built this wages map to keep an honest picture of what occupations look like in practice. Not as a single number, but as a full profile. A violin is a compact way to hold several everyday questions in one place: placement on the wage axis, spread, and what the extremes look like.
The first thing I read is placement on the horizontal axis. Left to right gives an immediate sense of how occupations compare on pay. But placement is only the start. The shape carries the more useful information. Where the violin is thick, many workers cluster. When most of the width sits in a compact band, outcomes feel more uniform. When the width is stretched across a wide band, outcomes feel more dispersed, even within the same title.
Sometimes the thickness is not organized around a single “fat part”. A violin can show two bulges or a clear shoulder. I read that as a likely mixture inside the same occupation: different specializations, seniority ladders, regions, employer types, or pay systems that sit on top of each other. It is still one title, but it does not behave like one uniform market.
The ends of the shape are where the map becomes concrete about downside and upside. A longer left end means low outcomes are visibly part of the profile, not just an exception. A longer right end means high outcomes exist, but the important detail is how quickly the shape becomes thin as it moves right. Fast tapering suggests tighter limits. Slow tapering suggests more room for extremes, and also a reminder to notice how rare those extremes look.
This is also how the map makes steady versus gamble readable without any special vocabulary. Some occupations put most of their width into a narrow middle and taper quickly, so the picture suggests many people land in roughly the same zone. Others span a wider range and keep long, thin tails, so the picture suggests outcomes can separate more sharply between low, common, and high.
When violin size reflects workforce, size adds a different kind of context. Bigger shapes mean a larger world of workers and employers. Smaller shapes mean fewer seats. That does not tell anything about pay quality by itself, but it changes how accessible a field may feel, and it helps interpret whether an occupation is broad and common or narrow and niche.
Put together, these cues support the comparisons I want this map to make easy: not only “more or less pay,” but what looks common, what looks rare, how wide outcomes seem, how visible the downside is, how plausible the upside feels, and whether the occupation is a large field or a small one.
A lot of visualization methods look smooth for the same reason some stories look neat: they edit out the awkward parts. I tried not to do that. When I smooth or interpolate, it is not to make the picture prettier. It is to make it stable, readable, and constrained in the right ways. If a method can overshoot, break monotonicity, or add little wiggles that suggest extra structure, I treat it as a bug, not a feature. The rule is simple: do not manufacture information. If I compress, warp, or transform, it has to be explicit and reversible, so the axis is still something you can trust.
The placement problem is where “a chart” turns into “a system”. Violins are wide, irregular shapes. They do not behave like points. Once I committed to real geometry, I needed a layout method that accepts that constraint instead of pretending it is not there. So I wrote a placement algorithm that treats legibility as a first class goal. Its priorities are, in order: minimize overlap, cover the major envelopes well, keep the larger and more consequential shapes visible, and make labels readable. That order is not just technical. It is a statement about what a viewer needs to keep their bearings.
The rendering is built to support scanning. Major groups are separated through color and structure so the overview reads like an organized field, not a pile of marks. Clipping keeps the global view calm, but identification still works even when items are partly clipped, because hiding geometry should not mean hiding access. Edges get as much care as fills. Subtle strokes, thickness variation, and transparency are used to keep dense regions readable. Small shapes stay present without shouting. Large shapes carry weight without taking over the page.
The UI is built around one idea: exploration should be easy enough that you actually do it. The global view is the index. Focus mode is the reading lens. A thin slice of the global scale can contain dozens of occupations, and interaction is what makes that density usable. A double tap or click focuses a major category, and getting back is equally direct. Tooltips are not decoration. They are local explanations: compact, consistent, and designed to keep the same meaning while changing the level of detail. You can move from “where am I?” to “what does this shape mean?” without losing context.
The project is not about making wages look clever. It is about making the distribution usable: something you can read, compare, and navigate, without collapsing it into a single number, a slogan, or a pretty illustration. And yet, once the distribution is allowed to show itself, the result can be genuinely beautiful, not because of styling tricks, but because the structure was always there, hidden under a flat, inexpressive net.
I started this work aiming for a static visualization, essentially an alternative to a beeswarm. It gradually turned into an interactive layout, and that proved to be the better direction. Working under the constraint of a single, self-contained file that runs anywhere HTML and JavaScript can render pushed the project in a few unexpected ways. The first versions were close to 30 MB. Through geometric optimization, I brought it down to about 4 MB without removing the core information.
The transitions are not the smoothest you will ever see, but they remain acceptable even on my older mobile devices. Keep in mind that this project does not rely on any heavy server-side computation. It is a serverless, JavaScript-based dynamic page. The transitions are not there for artistic effect. They exist to keep a clear link between visual states, a bridge that helps the viewer track what changed.
The work naturally split into three parts: reconstructing the violins, arranging the layout, and building the graphical dynamics and interaction. I used Python for the first two and for generating the primary graphical entities. I used HTML, CSS, and vanilla JavaScript for the interactive layer, with no external libraries.
Python, along with R, has basically become the common language of scientific data work. Coming from more than 30 years of C++, the hard part was not the syntax. It was the mindset shift. Once you start thinking in arrays, vectorization, and “let the data flow through operations,” a lot of your old habits stop being useful, and some become actively unhelpful.
Python is not a speed first language, and it is not built to make programmers feel virtuous. Code can get messy. Many of the low level memory and performance tricks that feel natural in C++ are either hidden or just not available in the same direct way. But the ecosystem is hard to argue with. So much of the heavy lifting is already implemented, compiled, and tuned, which means you can do serious numerical work without spending half your time wrestling the toolchain.
And then there is the current wave of development AI. It is not a replacement for clear thinking. If you do not already know how to program and validate results, it can confidently lead you into nonsense. But if you know what you are trying to build and you have good checks, it can be genuinely useful, like a fast assistant who still needs supervision.
The violin geometry and the layout are built in Python, exported as SVG, and then embedded into a single HTML file where every element can be manipulated interactively. CSS handles the look. JavaScript drives the behavior.
One early lesson was that the DOM is a terrible place to keep asking the same questions. So at startup, the JavaScript reads the SVG metadata and builds an internal, stateful model of the scene. From then on, interaction does not depend on repeatedly querying the DOM for data. The code can update styles, visibility, and hierarchy with minimal DOM churn, which matters when there are hundreds or thousands of paths on screen.
Coordinate handling is the other big piece. The view supports both linear and adaptive wage scales, and JavaScript applies the change as a reversible warp, not as a second separate rendering. Paths are transformed in place, so the same visual entities stay the same entities across modes. That makes transitions smoother and interaction more stable, because hover and focus always refer to the same underlying object, not to a regenerated copy.
Interaction itself is implemented as a small state machine. Pointer movement identifies what is under the cursor, then updates highlights and tooltips without rewriting the whole scene. The tooltip is treated as a primary reading surface. It can be pinned, it supports multiple internal views of the same distribution, and it responds to gestures and context actions. Its contents are not static labels stuck onto the SVG. They are computed and updated live from stored distribution data, including tick generation, bandwidth changes, and the mini plot rendering.
Focus mode is another place where JavaScript does real work. When a major category is focused, the code re-parents the relevant SVG groups into a dedicated wrapper, applies the clipping rules for that mode, and rescales the geometry to fit the plot box. It is done in a way that keeps alignment with the axes consistent, and it stays compatible with later mode changes like switching between linear and adaptive scales.
A lot of effort also goes into robustness, the unglamorous part that decides whether a tool survives contact with the real world. The code handles resizing, device differences, and browser zoom behavior so labels and interaction remain usable. It includes safe bounding box measurement and defensive geometry helpers, because in SVG the difference between “works” and “works everywhere” often lives in edge cases like fonts, transforms, and fractional pixels.
Finally, the JavaScript layer is deliberately dependency free. No external libraries makes the file portable and stable, but it also means the usual conveniences are replaced with custom, purpose built utilities: path parsing and rewriting, tick construction, gesture handling, caching, and small performance optimizations to keep the experience responsive on older devices.
Data visualization was the central goal of this project. The interface and rendering choices were guided by well known perceptual principles, applied in concrete ways:
1. Common region. Each major category has its own “container” on the page, so its violins read as a group at a glance. Shapes are also pre-clipped offline, which keeps runtime work low.
2. Continuity. In focus mode the violins are shown unclipped, and gentle transparency plus edge variation lets your eye follow the full outline without losing the overall form. In the global view, shapes are clipped for readability, but the highlight still reveals how they continue beyond the clipping boundary.
3. Closure. In the global view, violins are clipped by their major envelope and sometimes by larger neighbors, but the mind still reads a complete shape from what remains. Hover highlighting briefly reveals the full outline, and that short glimpse helps the eye “close” the shape even after the highlight is gone.
4. Proximity. Categories are kept a small distance apart, so violins within a category feel related while cross-category mixing stays rare.
5. Figure and ground. The scene stays visually clean: violins remain the main objects, and extra marks are avoided when they would steal attention or add interpretive noise. Labels, when enabled, are placed to stay out of the geometry as much as possible, because reading text and decoding shapes are different kinds of work.
6. Invariance. Sizes and shapes vary a lot, but the encoding never changes, so every violin is immediately recognized as the same kind of object doing the same job.
7. Pragnanz. A dense view stays readable only if it stays simple. Overlap and clipping are used on purpose to reduce clutter, while details remain easy to retrieve on hover or focus.
8. Similarity. Labels are styled to “belong” to their shapes. Label size tracks violin area using a compressed scaling, so large categories stand out without the small ones disappearing. This helps connect name and shape quickly, by both position and size.
9. Order. The layout uses spacing and alignment to stay organized. Gaps are not wasted space: they create a readable structure that guides the eye, keeps many labels legible, and makes vertical scanning feel natural.
10. Common fate. Even though each occupation has its own shape, the violins share a familiar left-to-right silhouette: a fuller body and a tapering tail. That makes them easy to interpret at a glance, showing where the mass sits and how quickly it thins out. The same idea guides the warping and focus animations, so motion clarifies structure instead of adding decoration.
Other considerations.
White edges are used instead of dark ones to keep the scene light and to separate neighboring shapes. Transparency is kept moderate so overlaps stay readable. A 22-category color palette is a real challenge, so it is tuned for clear local separation, especially between major groups that sit next to each other.
The guide is written to scan well: clear chapters and sections, distinct note and warning blocks, one sans-serif for text, one monospace for code and formulas, and three heading levels that make the structure easy to use.
Performance and stability matter here because the page is meant to be used, not just viewed. With many shapes on screen, constant pointer movement, frequent highlighting, tooltip updates, band dragging, focus transitions between global and major views, or scale warping, interaction has to stay steady. If it lags or the geometry drifts, the effort to decode the view stops being worth it. However, because it is a JavaScript-driven page with heavy interaction and frequent updates, it runs best on a reasonably recent device.
The work is split on purpose. The expensive part happens offline in Python: reconstructing each violin from sparse quantiles under strict constraints, generating clean geometry, and solving the layout as an optimization problem. Those steps are heavy by nature, but they run once, with full numerical control and room for validation. The output is not just a picture. It is a prepared scene: paths, hierarchy, positions, and the metadata needed for interaction.
JavaScript plays a different role. It does not rebuild distributions or re-solve the layout. It treats the exported SVG as a precomputed dataset and focuses on interface work: what to show, how to style it, and how to move between visual states. Even the adaptive scale is handled as a mapping computed from prepared anchors and applied as a reversible warp. Switching modes is a geometry transform, not a recomputation. That keeps the underlying objects stable across states, which makes hover, focus, and tooltip logic reliable.
This division keeps runtime cost bounded. Most interactions are incremental updates: toggling clipped versus unclipped views, updating a highlight overlay, rewriting only the paths that need warping, and updating tooltip contents. The code is structured to avoid expensive full redraw behavior and to minimize DOM work, because in a dense SVG that is usually the real bottleneck. The result is not perfect smoothness, but it is consistent behavior across devices, including older mobile hardware, which matters more for a reading tool than chasing ideal animation.
Short definitions for terms used throughout the project.
y=f(x) from input x to output y.f'(x), the local rate of change (slope) of a function.f"(x), describes how slope changes, used as a proxy for curvature.y=f(x) it relates to f" and f'.C(x(t), y(t)) with a parameter t, useful when y is not a simple function f(x).F(x)=P(X<=x), monotone nondecreasing in x.Q(p)=F^{-1}(p), maps probability p to a value (wage).f(x)=dF/dx when it exists, integrates to 1 over the domain.F(x)=∫_{-∞}^{x} f(t) dt.f(x)=F'(x) in regions where F is differentiable.